Back

Synthetic Biology

Oxford University Press (OUP)

Preprints posted in the last 90 days, ranked by how well they match Synthetic Biology's content profile, based on 21 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit.

1
Quantitative modeling reveals sources of variability in transcriptional activation assays

Greenwood, M.; Reardon, K. F.; Prasad, A.

2026-01-30 synthetic biology 10.64898/2026.01.30.702786 medRxiv
Top 0.1%
12.9%
Show abstract

Reporter cell assays, such as those used to detect estrogenic chemicals, can detect target chemicals at low concentrations and can be used to analyze chemical mixtures without a priori knowledge of the mixture components. However, the outputs of these assays are affected by biological variability, which complicates their interpretation. Here, we describe and demonstrate a workflow that is useful for determining potential sources of biological variability and optimizing the performance of cell-based assays. The workflow involves developing an appropriate mathematical model for a transcriptional activation assay, calibrating it with experimental data, and conducting sensitivity analysis to characterize individual components of the genetic circuit based on their effect on the reporter signal output. This workflow was tested using an estrogen receptor transcriptional activation assay. For this circuit, our analysis predicts that controlling estrogen response element number, promoter strength, and reporter signal degradation rates minimizes reporter output variability. We show that careful model development, calibration, and analysis can offer biologically relevant insights to minimize the variability of cell-based assays and improve genetic circuits for increased sensitivity and dynamic range.

2
Evaluating AI-Assisted Customer Verification for Synthetic Nucleic Acid Screening

Acelas, A.; Palya, H.; Flyangolts, K.; Fady, P.-E.; Nelson, C.

2026-03-01 synthetic biology 10.64898/2026.02.27.708645 medRxiv
Top 0.1%
8.7%
Show abstract

Legitimacy screening, the process of verifying the identity and purpose of customers ordering synthetic nucleic acids, is a primary safeguard against the misuse of synthetic biology. However, the associated costs discourage the adoption of screening practices. To evaluate whether AI tools can facilitate this process, we tested five large language models on five verification tasks using customer profiles of life sciences researchers from around the world. We compared AI performance against an expert human baseline on flag accuracy, source quality, source fidelity, and cost. The best-performing model, Gemini 2.5 Pro aided by four bibliographic and sanctions APIs, achieved comparable flag accuracy to the human baseline (90% and 89%, respectively). Gemini 2.5 Pro outperformed the human baseline on source quality and fidelity, at roughly one-tenth of the cost ($1.18 vs. $14.04 per customer). For information-gathering tasks, which excluded the human review step, costs averaged $0.23 per customer, around 50 times cheaper than human screening. These results support piloting AI-assisted legitimacy screening at providers of synthetic nucleic acids and other dual-use biotechnology products, with AI systems handling information gathering and human reviewers retaining authority over order fulfillment decisions.

3
ro-crate-rs: Development of a Lightweight RO-Crate Rust Library for Automated Synthetic Biology

Burridge, M. S.; Ou, Z.; James, K.; Lim, J.; Buldum, G.; Finnigan, J.; Charnock, S. J.; Wipat, A.

2026-01-22 synthetic biology 10.64898/2026.01.22.701040 medRxiv
Top 0.1%
6.2%
Show abstract

Advances in laboratory automation and AI-driven experimental design have increased the scale and complexity of data generated in synthetic biology. Whilst biofoundries provide significant resources and infrastructure to execute these experiments, most laboratories rely on isolated automated instruments and software systems that operate as disconnected silos, producing heterogeneous data formats with little structured metadata. This fragmentation hinders data integration, reproducibility, and downstream computational workflows. A potential solution is RO-Crate, which offers a lightweight, extensible framework for packaging research data with machine-readable metadata, but existing tooling remains immature for automation-orientated, cloud-native, or high-throughput laboratory workflows. Here, we introduce ro-crate-rs, a new suite of tools centred on a performant Rust library for constructing, validating and packaging RO-Crates across diverse compute environments and automated hardware. The library enforces RO-Crate 1.1 constraints through strong typing while enabling flexible extensions, and is complemented by a Python API and CLI for interactive use and pipeline integration. We demonstrate this combined approach through a semi-automated Old Yellow Enzyme characterisation workflow, showing how RO-Crates can capture data and metadata across multiple independent instruments. Together, these tools provide a robust foundation for FAIR-compliant, automation-ready data management and enable reproducible reconstruction of experimental workflows even in non-biofoundry settings. Availabilityhttps://github.com/intbio-ncl/ro-crate-rs

4
Nucleotide-level chemical reaction network modeling enables quantitative prediction of reconstituted cell-free expression system

Jurado, Z.; Pandey, A.; Murray, R. M.

2026-02-23 synthetic biology 10.64898/2026.02.22.707325 medRxiv
Top 0.1%
4.8%
Show abstract

Cell-free expression systems offer a method for rapid prototyping of DNA circuits and functional protein synthesis. While crude extracts remain a black box with many components carrying out unknown reactions, PURE contains only the required transcription and translation components for protein production. All proteins and small molecules are at known concentrations, enabling detailed modeling for reliable computational predictions. However, there is little to no experimental data supporting the expression of target proteins for PURE-based models. In this work, we generalized the PURE detailed translation model for proteins with arbitrary amino acid compositions and lengths. We then built a chemical reaction network for transcription in PURE, validating the transcription models using DNA expression for the malachite-green aptamer (MGapt) to measure mRNA production. Lastly, we coupled the transcription and the generalized translation models to create a PURE protein synthesis model built purely of mass-action reactions. We used the combined model to capture the kinetics of MGapt and deGFP expressed from plasmids at varying concentrations.

5
Using a GPT-5-driven autonomous lab to optimize the cost and titer of cell-free protein synthesis

Smith, A. A.; Wong, E. L.; Donovan, R. C.; Chapman, B. A.; Harry, R.; Tirandazi, P.; Kanigowska, P.; Gendreau, E. A.; Dahl, R. H.; Jastrzebski, M.; Cortez, J. E.; Bremner, C. J.; Hemuda, J. C. M.; Dooner, J.; Graves, I.; Karandikar, R.; Lionetti, C.; Christopher, K.; Consiglio, A. L.; Tran, A.; McCusker, W.; Nguyen, D. X.; Nunes da Silva, I. B.; Bautista-Ayala, A. R.; McNerney, M. P.; Atkins, S.; McDuffie, M.; Serber, W.; Barber, B. P.; Thanongsinh, T.; Nesson, A.; Lama, B.; Nichols, B.; LaFrance, C.; Nyima, T.; Byrn, A.; Thornhill, R.; Cai, B.; Ayala-Valdez, L.; Wong, A.; Che, A. J.; Thavaraj

2026-02-05 synthetic biology 10.64898/2026.02.05.703998 medRxiv
Top 0.1%
3.6%
Show abstract

We used an autonomous lab, comprising a large language model (LLM) and a fully automated cloud laboratory, to optimize the cost efficiency of cell-free protein synthesis (CFPS). By conducting iterative optimization, the LLM-driven autonomous lab was able to achieve a 40% reduction in the specific cost ($/g protein) of CFPS relative to the state of the art (SOTA). This cost reduction was accompanied by a 27% increase in protein production titer (g/L). Iterative experimental design, experiment execution, data capture and analysis, data interpretation, and new hypothesis generation were all handled by the LLM-driven autonomous lab. The interface between OpenAIs GPT-5 LLM and Ginkgo Bioworks cloud laboratory incorporated built-in validation checks via a Pydantic schema to ensure that AI-designed experiments were properly specified. Experimental designs were translated into programmatic specification of multi-instrument biological workflows by Ginkgos Catalyst software and executed on Ginkgos Reconfigurable Automation Cart (RAC) laboratory automation platform, with human intervention largely limited to reagent and consumables preparation, loading and unloading. By integrating LLMs with programmatic control of a cloud lab, we demonstrate that an LLM-driven autonomous lab can successfully perform a real-world scientific task, highlighting the potential of AI-driven autonomous labs for scientific advancement.

6
Integration-coupled activation of promoterless combinatorial pathway libraries in Clostridium avoids burden during DNA assembly

Mordaka, P. M.; Williamson, J.; Heap, J. T.

2026-01-21 synthetic biology 10.64898/2026.01.20.700586 medRxiv
Top 0.1%
3.1%
Show abstract

Combinatorial DNA design and assembly is an efficient and pragmatic way to obtain high-performing metabolic pathway designs quickly. However, implementation may require organism-specific technical barriers to be overcome. Firstly, suitable expression control parts such as promoters and ribosome-binding sites (RBSs) which provide a suitable range of expression levels need to be identified or developed. Secondly, these need to be assembled into pathway-encoding combinatorial libraries of sufficient size, quality and diversity. For organisms with transformation frequencies too low to allow direct transformation of library assembly reactions, such as many Clostridium spp., assembly and amplification is typically carried out using Escherichia coli. However, if constructs are deleterious (or burdensome) to E. coli, which is often the case when using Clostridium genetic parts, poor libraries may be obtained. Here we develop a new approach called integration-coupled activation of promoterless sequences (ICAPS) to overcome this barrier and therefore enable combinatorial assembly in Clostridium. Libraries were designed and assembled as promoterless synthetic operons, preventing expression during DNA assembly, and expression was only activated later, when constructs were integrated into the host genome downstream of a promoter. Variation of expression levels was achieved using a range of context-resistant RBS sequences. This approach was used to produce a Clostridium acetobutylicum library with combinatorial expression variants of an introduced hexanol pathway. This proof of concept provides a generally-applicable approach to implement combinatorial metabolic pathway-encoding libraries in Clostridium spp., circumventing the excessive strength of Clostridium expression control parts in E. coli, and is applicable to other organisms.

7
Refactored genetic parts for modular assembly of the E. coli MccV type I secretion system used to screen class II microcin candidates from plant-associated bacteria

Morton, A. K.; Chaudhari, K.; Matibag, B. D.; Iyengar, V. B.; Dullen, K. E.; VanDieren, A. J.; Parker, J. K.; Mishler, D. M.; Barrick, J. E.

2026-01-20 synthetic biology 10.64898/2026.01.19.700402 medRxiv
Top 0.1%
2.7%
Show abstract

BackgroundMicrocins are small antibacterial proteins secreted by gram-negative bacteria. The activities of new microcins discovered using bioinformatic searches need to be validated and characterized to understand how they mediate competition in microbiomes and to evaluate their potential as new therapeutics for combating antibiotic resistance. Engineered plasmids containing the type I secretion system associated with Escherichia coli Microcin V (MccV) can secrete heterologous proteins, including other class II microcins, and this system functions in other bacterial hosts. However, existing microcin secretion constructs are not designed for easily swapping components -- such as origins of replication, resistance genes, promoters, and signal peptides -- that may need to be changed for compatibility with other chassis. ResultsWe refactored the E. coli MccV type I secretion system into genetic parts compatible with a modular Golden Gate assembly scheme and used these parts to construct two-plasmid microcin secretion systems. In our design, one plasmid encodes the type I secretion system proteins, and the other encodes a signal peptide fused to the cargo protein that will be secreted. We tested two versions of a system with inducible promoters separately controlling expression of the components on each plasmid. One used plasmids that replicate in E. coli and its close relatives. The other used broad-host-range plasmids. When induced to secrete MccV, both versions produced similar zones of inhibition against a susceptible strain of E. coli. Next, we identified putative class II microcins in genomes of bacteria from plant-associated genera (Pantoea, Erwinia, and Xanthomonas) using an existing bioinformatics pipeline. We screened 23 of these putative microcins for E. coli self-inhibition. Seven exhibited some inhibition, mostly later in growth curves, but none had effects that were comparable in strength to MccV. ConclusionsThe genetic parts we created can be assembled in various combinations into tailored systems for secreting small proteins from diverse bacterial chassis. These systems can be used to further characterize the targets of novel microcins and to secrete these or other small proteins for various applications. For example, beneficial bacteria used in crop protection could be engineered to secrete microcins that kill or inhibit plant pathogens to increase their efficacy.

8
Fundamental limitations of genomic language models for realistic sequence generation

Tzanakakis, A.; Mouratidis, I.; Georgakopoulos-Soares, I.

2026-01-18 synthetic biology 10.64898/2026.01.17.700093 medRxiv
Top 0.1%
2.3%
Show abstract

Large language models (LLMs) have shown remarkable success in natural language processing, prompting interest in their application to genomic sequence analysis. Genomic Language Models (gLMs) based on similar architectures offer a promising avenue for synthetic genome generation and characterization. However, their effectiveness for biological sequence modeling remains poorly characterized. We present a comprehensive evaluation of genomic language models that explicitly aim to generate entire synthetic genomes. We tested Evo 2 on diverse prokaryotic, eukaryotic and viral genomes, and megaDNA on bacteriophage genomes, and assessed performance across key biological features and organizational patterns. Our results reveal systematic failures in gLM-based genomic reconstruction. While the synthetic sequences captured local sequence statistics, they consistently failed to preserve long-range genomic organization, repeat and k-mer composition, transcription factor binding site architecture, and evolutionary constraints. Generated sequences exhibited violations of natural genomic patterns and models showed particular difficulty with repetitive elements. To assess the quality of genome generation, we trained a convolutional neural network that reliably distinguished synthetic from natural sequences, achieving AUROC values up to 0.97 in eukaryotes and 0.82 in prokaryotes, with classification accuracy increasing monotonically with genomic distance from the seed. These findings suggest fundamental limitations in current gLM architectures for capturing the long-range, hierarchical nature of genomic sequences. Our work highlights the need for specialized architectures that explicitly model evolutionary constraints rather than relying solely on statistical patterns, with important implications for computational biology applications requiring realistic sequence generation and for biosafety assessments that depend on the distinguishability of synthetic and natural genomic sequences.

9
Improved vector toolkit for genome writing in mammalian cells

Barriball, K.; Berrios, B.; Pinglay, S.; Zhao, Y.; Chalhoub, N.; Tsou, T.; Atwater, J. T.; Boeke, J. D.; Zhang, W.; Brosh, R.

2026-03-16 synthetic biology 10.64898/2026.03.15.711894 medRxiv
Top 0.1%
2.1%
Show abstract

Efficient genome writing in mammalian cells requires robust methods for integrating large DNA payloads. The previously described method mammalian Switching Antibiotic resistance markers Progressively for Integration (mSwAP-In) enables iterative, biallelic genome rewriting in mammalian stem cells with DNA payloads exceeding 100 kb. However, the lack of standardized vectors and certain technical constraints have limited its broader adoption. Here we present an improved plasmid toolkit designed to streamline the implementation of mSwAP-In. The toolkit includes two core vectors. pLP-TK (pCTC174) is a landing-pad plasmid compatible with Golden Gate assembly of genomic homology arms and supports both mSwAP-In and the recombinase-mediated cassette exchange method Big-IN. mSwAP-In MC2v2 (pKBA135) is a versatile Big DNA assembly and delivery vector that supports Gibson-based assembly and incorporates positive, negative, and fluorescent selection markers, as well as a backbone counterselection cassette to minimize unwanted plasmid integration. The vector architecture also enables propagation in yeast and bacterial hosts, inducible plasmid copy-number amplification in standard E. coli strains, and CRISPR/Cas9-mediated payload release through preinstalled guide RNA target sites. We further characterize the FCU1/5-FC counterselection system in mouse embryonic stem cells and define conditions that minimize its bystander toxicity. Finally, we provide a set of Cas9-gRNA expression plasmids optimized for common mSwAP-In applications. Together, these reagents constitute a standardized and experimentally validated toolkit that simplifies large-scale genome writing using mSwAP-In.

10
Outpacing E. coli: Development of Vibrio natriegens as a Next-Generation Cloning Host

Wei, E.; Louie, M.; Dessimoz, E.; Orona, C.; Smith, N.; Holste, N.; Slind, M.; Nguyen, H.; Anandhan, S.; Kallivalappil, S. T.; Weinstock, M. T.

2026-02-12 synthetic biology 10.64898/2026.02.11.705132 medRxiv
Top 0.1%
2.1%
Show abstract

Despite transformative advances in DNA synthesis, sequencing, and automation that have accelerated recombinant DNA workflows, molecular cloning hosts have scarcely evolved past the Escherichia coli strains adopted out of convenience in the 1970s. We present NBx CyClone - an engineered strain of Vibrio natriegens - as a next-generation host for molecular cloning. This non-pathogenic marine bacterium combines broad plasmid and genetic tool compatibility, a versatile metabolism, and the fastest known doubling time of any free-living organism. By shortening growth-dependent steps, this host offers a practical route to faster, more efficient recombinant DNA workflows across research and industry.

11
Optimization of PURE system composition using automation and active learning

Bernard-Lapeyre, Y.; Cleij, C.; Sakai, A.; Huguet, M.-J.; Danelon, C.

2026-03-25 synthetic biology 10.64898/2026.03.23.713685 medRxiv
Top 0.1%
2.0%
Show abstract

Protein synthesis using recombinant elements (PURE) system has been widely applied in various biological research fields and synthetic cell construction. Optimization efforts to enhance the PURE system performance by adjusting its individual components have remained limited to the expression of single genes with a small number of molecular compositions tested, making it difficult to link component composition to system-level performance across different DNA contexts. Here, we combine automated acoustic liquid handling with an active learning framework to explore broadly the compositional landscape of PURE system. By grouping the 69 individual components (including proteins and tRNAs) into 21 functional sets and iteratively guiding experiments with active learning, we rapidly identify improved compositions and demonstrated up to 3-fold enhancement in protein yield and translation rate for a single reporter gene. We further show that optimization drivers differ between low and high DNA concentrations, revealing that optimal PURE compositions are DNA concentration-dependent. We then apply this optimization strategy to enhance the expression of a 41-kb synthetic chromosome containing 15 genes by maximizing the fluorescence intensities of two reporter proteins. While a 3-fold improvement could be reached on the two gene products guiding learning, a full proteomic analysis revealed that optimization is gene-specific, i.e., changes in PURE system compositions differently impact the amounts of synthesized proteins encoded on the same DNA template. Together, this work establishes active learning as an efficient strategy to navigate the high-dimensional PURE compositional space and provides mechanistic insight into DNA context-dependence of gene expression optimization.

12
TargetMITO: A rule-based model for generating highly functional synthetic mitochondrial targeting sequence in yeast

Gombeau, K.; Wan, R.; James, J. S.; Tribouillard-Tanvier, D.; Cai, Y.

2026-02-23 synthetic biology 10.64898/2026.02.22.707306 medRxiv
Top 0.1%
1.8%
Show abstract

Mitochondria are essential organelles containing their own genomes, encoding a few proteins essential for energy production. Most of the mitochondrial proteins are nucleus-encoded, translated as precursors in the cytoplasm, with a large fraction of these precursors properly addressed by an N-terminal mitochondrial targeting sequence (MTS). These MTS share common features but no consensus sequence can explain their functionality nor the precursors-specific determinants of mitochondrial import. To decipher this mechanism, we created a simple computational model to generate highly functional synthetic MTS while maintaining a tight control on the design parameters. Using the budding yeast, we demonstrated the presence of precursors-specific signatures in addressing artificially nucleus-relocated OXPHOS proteins. We also show the ability of six promising candidate synthetic MTS to address a fluorescent reporter to human mitochondria cells. Our research work confirms the uniqueness of the MTS-passenger protein synergy and takes us one step closer towards improving gene therapy-based treatment of mitochondrial diseases. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=81 SRC="FIGDIR/small/707306v1_ufig1.gif" ALT="Figure 1"> View larger version (27K): org.highwire.dtl.DTLVardef@841eeborg.highwire.dtl.DTLVardef@9ea0eorg.highwire.dtl.DTLVardef@e553b8org.highwire.dtl.DTLVardef@1db754a_HPS_FORMAT_FIGEXP M_FIG C_FIG

13
A CURE for synthetic regulation of gene expression: Rapid screening of guide RNA efficacy as a framework for enabling undergraduate research in plant synthetic biology

Bull, T.; Carlsen, L.; Hoglund, N.; Blarr, J.; Ciernia, M.; Daughtrey, H.; Gulnac, K.; Kathan, Z.; Labovitz, B.; Lonergan, R.; McDermott, M.; Medina, A.; Mikol, Z.; Miller, Z.; Prahl, K.; Rifai, C.; Schrems, E.; Shinkawa, F.; Summerfield, J.; Thevarajah, E.; Wagner, S.; Zimmerman, T.; Khakhar, A.

2026-03-31 synthetic biology 10.64898/2026.03.31.715601 medRxiv
Top 0.1%
1.3%
Show abstract

Course-based Undergraduate Research Experiences (CUREs) have emerged as a transformative approach to science education, expanding access to authentic research opportunities beyond the traditional undergraduate research assistant (URA) training. By embedding research into a curriculum, CUREs engage a broad and diverse population of students in a classroom environment that emphasizes experimental design, data analysis, and scientific communication. However, this has been difficult to develop for fields such as plant synthetic biology due to the long timescales of plant transformation. One avenue around this problem is to utilize a recent innovation that enables high throughput and rapid screening of gRNA efficacy by leveraging viral-based delivery of guide RNAs (gRNAs). In this work, we develop and validate a CURE with undergraduate students at Colorado State University (CSU). Students worked in teams to design and test efficacy of gRNAs targeting a Cas9-based transcriptional repressor to different regions of the promoters of the three GIBBERELLIN INSENSITIVE 1 genes (GID1a, GID1b, and GID1c) in Arabidopsis thaliana. Over the semester, students generated and analyzed gene expression data to understand the efficiency of twelve new gRNAs. We further validated CURE student-identified gRNAs with an undergraduate research assistant (URA) that assessed target gene expression and phenotypic outcomes in stable transgenic lines expressing SynTF constructs with the strongest gRNAs from the class. We further describe the curriculum structure to facilitate adoption at other institutions and present student-generated datasets demonstrating the utility of ViN-based screening for identifying effective SynTF gRNAs for plant functional genomics and engineering. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=111 SRC="FIGDIR/small/715601v1_ufig1.gif" ALT="Figure 1"> View larger version (35K): org.highwire.dtl.DTLVardef@13869f5org.highwire.dtl.DTLVardef@b469feorg.highwire.dtl.DTLVardef@9aa51borg.highwire.dtl.DTLVardef@cdc129_HPS_FORMAT_FIGEXP M_FIG C_FIG

14
The Limits of Sequence-Based Biosecurity Screening Tools in the Age of AI-Assisted Protein Design

Wittmann, B. J.; Wheeler, N. E.; Murphey, S. T.; Mitchell, T.; Magalis, B.; Gemler, B.; Flyangolts, K.; Diggans, J.; Clore, A.; Beal, J.; Bartling, C.; Alexanian, T.; Horvitz, E.

2026-03-05 synthetic biology 10.64898/2026.03.04.709671 medRxiv
Top 0.1%
1.2%
Show abstract

Rapid advancements in AI have enabled significant progress in protein and nucleic acid design, but they also pose biosecurity challenges. We examine the vulnerabilities of biosecurity screening software (BSS) to AI-reformulated synthetic homologs of proteins of concern (POCs) that have been fragmented into smaller segments. We evaluate four BSS tools that were recently patched to enhance their AI resiliency. Without any further modification, we found that two of the four tools were capable of robustly detecting fragments as short as 50 nucleotides, demonstrating screening capabilities that exceed those requested in the U.S. Framework for Nucleic Acid Synthesis. Upgraded versions of the other two tools improved performance. Although our findings confirm the effectiveness of the tested BSS tools, at the same time, they emphasize the urgency of developing alternate BSS approaches to counter evolving AI-enabled biosecurity risks.

15
CombinGym: a benchmark platform for machine learning-assisted design of combinatorial protein variants

Chen, Y.; Fu, L.; Lu, X.; Li, W.; Gao, Y.; Wang, Y.; Ruan, Z.; Si, T.

2026-03-25 synthetic biology 10.64898/2026.03.24.714074 medRxiv
Top 0.1%
1.2%
Show abstract

Combinatorial mutagenesis is essential for exploring protein sequence-function landscapes in engineering applications. However, while large-scale machine learning benchmarks exist for protein function prediction, they are primarily limited to single-mutant libraries, leaving a critical gap for combinatorial mutagenesis. Here we introduce CombinGym, a benchmarking platform featuring 14 curated combinatorial mutagenesis datasets spanning 9 proteins with diverse functional properties including binding affinity, fluorescence, and enzymatic activities. We evaluated nine machine learning algorithms from five methodological categories (alignment-based, protein language, structure-based, sequence-label, and substitution-based) across multiple prediction tasks, assessing both zero-shot and supervised learning performance using Spearmans {rho} and Normalized Discounted Cumulative Gain metrics. Our analysis reveals the substantial impact of measurement noise and data processing strategies on model performance. By implementing hierarchical dataset splits (0-vs-rest, 1-vs-rest, 2-vs-rest, and 3-vs-rest scenarios), we demonstrate the value of lower-order mutation data for empowering machine learning models to predict higher-order mutant properties. We validated this capacity through both in silico simulation (improving fluorescence brightness of an oxygen-independent fluorescent protein) and experimental validation (engineering enzyme substrate specificity), achieving a substantial increase in specific activity. All datasets, benchmarks, and metrics are available through an interactive website (https://www.combingym.org), facilitating collaborative dataset expansion and model development through integration with automated biofoundry platforms.

16
A Yeast Surface Display Platform for Screening Dimeric Mammalian Receptors

Slaton, E. W.; Krivanek, E. C.; Kimmel, B. R.

2026-01-30 synthetic biology 10.64898/2026.01.29.702702 medRxiv
Top 0.1%
0.9%
Show abstract

Discovering proteins that modulate receptor activity remains a key challenge in the field of protein design and engineering. Traditionally, identifying proteins that interact with receptors often relies on binding as a selection criterion, yielding limited information about the function of discovered binders in a library, including the ability to activate or block signaling cascades associated with the receptor of interest. As a result, extensive downstream characterization is required to assess the biological relevance of discovered binders. To address this issue, we have developed a high-throughput screening system to screen dimeric mammalian receptors using yeast surface display. We demonstrate the programmed dimerization of the extracellular domains of mammalian receptors in yeast via engineered induction pathways, thereby enabling receptor expression and the secretion of associated native cytokines. This surface expression of the involved subunits for the protein receptor and cytokine-induced dimerization activity indicates that the receptor has been activated and is expected to trigger a DNA-driven signaling cascade within a mammalian cell. This system provides a modular platform technology that advances existing yeast-display systems, demonstrating the effectiveness of these high-throughput platforms for screening the function of mammalian receptors. This work is expected to provide a rapid, cost-effective approach to the molecular discovery of novel biologics for targeting dimeric mammalian receptors.

17
Cross-strain transferability of CRISPRi systems and design rules from laboratory to clinical Escherichia coli strains

Ban, H.; Rondthaler, S. N.; Lebovich, M.; Lora, M. A.; Ugbesia, B.; Andrews, L. B.

2026-01-29 synthetic biology 10.64898/2026.01.28.702340 medRxiv
Top 0.1%
0.8%
Show abstract

CRISPR interference (CRISPRi) has emerged as a versatile approach for targeted gene repression in many organisms, including microbes and bacteria, due to the simple design of sequence-specific transcriptional silencing of gene expression. However, the strain-specific effects on repression efficiency and the host when translating a CRISPRi system from a laboratory strain to non-model strains are not well understood, yet they can present important limitations to its use. Here, we investigated the repression efficiency and toxicity of three CRISPRi systems (one dCas9 and two dCas12a variants) across four different Escherichia coli strains, including a laboratory K-12 strain (MG1655) and three non-model strains that are clinical isolates (probiotic Nissle 1917, uropathogenic CFT073, and uropathogenic UMN026). We evaluated the repression in each strain using sets of guide RNAs (gRNAs) targeting along the gene sequence and assayed cytotoxicity of expressing each dCas protein. Growth toxicity from expression of the different dCas proteins notably differed and showed high variation between some host strains. We also observed variable repression among the strains and notably poorer repression in multiple clinical strains. Therefore, we developed a dual gRNA CRISPRi system for enhanced gene silencing among the strains, which achieved up to 824-fold repression in CFT073. The results demonstrate that strain-specific design considerations can arise when a CRISPRi genetic system is transferred to a closely related bacterial strain. These findings provide insight into the relationships between criteria used for CRISPRi genetic design and in vivo activity across non-model E. coli strains, providing guidelines for diverse applications of these tools.

18
CyanOperon: an operon building expansion for the CyanoGate MoClo toolkit

Astbury, M. J.; Schiavon Osorio, A. A.; Victoria, A. J.; McCormick, A. J.

2026-02-24 synthetic biology 10.64898/2026.02.24.707249 medRxiv
Top 0.1%
0.8%
Show abstract

Operons are gene clusters controlled by a single promoter that enable coordinated translation from a single messenger RNA. Here we describe an expansion of the CyanoGate MoClo toolkit to assemble synthetic operons. The versatile CyanOperon system includes two Level 0 acceptor vectors for building interchangeable promoter-ribosome bind site (RBS) combinations and 15 Level 1 acceptor vectors for the hierarchical assembly and expression of up to six genes within a single operon. The system also allows for operon assembly into a self-replicating vector or for chromosomal integration by homologous recombination. To showcase CyanOperon, we assembled the violacein biosynthesis pathway as an operon and demonstrated violacein production in Escherichia coli. We then constructed a 20-part RBS library to examine how spacer length between the Shine-Dalgarno sequence and start codon affects translation in E. coli and the model cyanobacterium Synechocystis sp. PCC 6803. Lastly, we compared the expression of up to three operonic fluorescent markers following chromosomal integration or from a self-replicating vector in E. coli and Synechocystis sp. PCC 6803. The CyanOperon system is publicly available and can be readily integrated with other MoClo systems to accelerate the development of standardized operon assemblies. GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=81 SRC="FIGDIR/small/707249v1_ufig1.gif" ALT="Figure 1"> View larger version (11K): org.highwire.dtl.DTLVardef@29557borg.highwire.dtl.DTLVardef@1ab1d18org.highwire.dtl.DTLVardef@1033249org.highwire.dtl.DTLVardef@da6db5_HPS_FORMAT_FIGEXP M_FIG C_FIG

19
Gain-Scheduled Optogenetic Feedback for Disturbance Rejection in Bacterial Batch Cultures

Namboothiri, H. R.; Hu, C. Y.

2026-04-05 synthetic biology 10.64898/2026.04.04.716495 medRxiv
Top 0.1%
0.8%
Show abstract

Precise regulation of gene expression in batch bacterial cultures is challenging because the underlying dynamics vary with cellular physiological state over time. Although cell-silicon systems enable rapid, real-time optogenetic control, disturbance rejection remains difficult in batch culture because the plant dynamics shift across growth phases, limiting the effectiveness of fixed-gain controllers designed under constant-growth assumptions. Here, we present a multiscale model-guided feedback control framework for disturbance rejection in batch E. coli cultures. Frequency-response analysis shows that the input-output dynamics of gene expression depend strongly on growth phase, revealing operating-point-dependent limits on the disturbance rejection performance of a fixed-gain PID controller. To address this limitation, we develop two growth-aware control strategies: a gain-scheduled PID (PID-GS) controller that adapts to cellular physiological state, and a gain-scheduled feedback-feedforward controller (PID-GS-FF) that further compensates for growth perturbations. We also introduce a controller evaluation framework that identifies three distinct operating regimes for targeted experimental validation. Together, these results show that accounting for growth-state-dependent dynamics is necessary for robust disturbance rejection in batch culture and provide a control-oriented framework for regulating living systems with shifting operating conditions.

20
An integrated synthetic biology and robotics approach for neutralising landmines in post-war communities

Basti, Y.; Williams, S.; Aellen, E.; Muci, F.; Amri, I.; Davila, A.; Schluter, A.; Dao, A.; Meyer, P.; Dembska, J.; Smith, R. C.; McCabe, B. D.

2026-01-21 synthetic biology 10.64898/2026.01.20.700574 medRxiv
Top 0.1%
0.8%
Show abstract

Unexploded ordnances (UX.Os) and landmines endanger lives and hinder the economic progress of communities living in post-conflict zones. Currently, the primary method for clearing UX.Os relies on metal detection and manual removal of UX.Os - an expensive, time-consuming, and hazardous process. This study, derived from the 2024 EPFL iGEM project SYNPLODE, presents a new approach that integrates synthetic biology and aerial drone robotics, proposing a novel, end-to-end, safe, and efficient solution to address UX.Os. Starting from bacteria engineered to detect and degrade 2,4,6-trinitrotoluene (TNT), a common explosive in landmines, our solution is designed for three main tasks: detecting TNT and RDX, breaking these compounds down into non-explosive byproducts, and confirming explosive neutralisation. To deploy this solution safely in UXO-contaminated areas, we designed, built, and tested an aerial drone capable of spraying explosive-degrading bacteria. Combining synthetic biology, robotics, mathematical modelling, and affected community engagement, our solution aims to improve UXO and landmine clearance by offering a scalable and cost-effective approach for deactivating UX.Os without risking human lives.